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Abstract 

This paper considers a stochastic SIR (susceptible— >■ infective— >-removed) 
epidemic model in which individuals may make infectious contacts in 
two ways, both within 'households' (which for ease of exposition are 
assumed to have equal size) and along the edges of a random graph 
describing additional social contacts. Heuristically- motivated branch- 
ing process approximations are described, which lead to a threshold 
parameter for the model and methods for calculating the probabil- 
ity of a major outbreak, given few initial infectives, and the expected 
proportion of the population who are ultimately infected by such a 
major outbreak. These approximate results are shown to be exact as 
the number of households tends to infinity by proving associated limit 
theorems. Moreover, simulation studies indicate that these asymptotic 
results provide good approximations for modestly-sized finite popula- 
tions. The extension to unequal sized households is discussed briefly. 
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1 Introduction 

Epidemic models which include some element of realistic population structure 
have been the subject of a considerable amount of recent study in recogni- 
tion of the fact that the classical homogeneously-mixing models are quite 
unrealistic for all but the smallest of populations. 

One approach to this has been to allow local contacts of some kind, mod- 
elling contacts which occur on a regular basis in addition to maintaining 
the 'well-mixed' global contacts to model chance interactions with random 
members of the population. A common form for these local contacts to 
take arises by partitioning the population into households, where these local 
contacts can occur only between individuals who are in the same household 
(see, for example, Becker and Dietz (1995) and Ball et al. (1997)). This can 
be extended to the overlapping groups model where the population may be 
partitioned in more than one way (for example, by household and by work- 
place), with local interactions taking place at (possibly) different rates within 
groups of the different partitions, see Ball and Neal (2002). Another mode of 
local interactions is described by the so-called great circle model (Ball et al, 
1997; Ball and Neal, 2002, 2003), where the population is spread around a 
circle and individuals have local contact with only their nearest neighbours. 
This model is closely related to 'small-world' models (Watts and Strogatz, 
1998), which have received considerable attention, particularly in the physics 
literature. 

Another way of accounting for the inhomogeneous nature of interactions 
is by using random graphs to model social networks (see, for example, Ander- 
sson (1997, 1998, 1999), Newman (2002), Durrett (2006), Kenah and Robins 
(2007) and Britton et al. (2008)). Perhaps the most important aspect of 
these random graph models is that they incorporate a specified degree dis- 
tribution, the degree of a node in the graph corresponding to the number of 
other members in the population an individual can possibly make infectious 
contact with. These models have been extended to also incorporate 'casual 
contacts' by way of the classical homogeneous mixing effects, see Kiss et al. 
(2006) and Ball and Neal (2008). 
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In this paper we investigate a model for an SIR (susceptible— ^infective— )-removed) 
epidemic in a closed finite population, which draws together the main aspects 
of the generalisations of the standard homogeneously mixing model described 
above. We consider a population grouped into households, with infectious 
contacts at a given per-pair rate, where individuals also make global con- 
tacts along the edges of a random graph over the whole population. We use 
branching process approximations to derive (i) a threshold parameter, which 
determines whether a disease with just a few initial infectives can become 
established and infect a non-negligible proportion of the population (an event 
we call a major outbreak); (ii) the probability that a major outbreak occurs; 
and (iii) the expected proportion of the population that is infected by a major 
outbreak. These results are approximations that become exact in the limit 
as the size of the population becomes large in an appropriate way. 

A feature of our model is that there is clustering present in the network of 
possible contacts, roughly meaning that there are significant numbers of tri- 
angles (and other short cycles) present in the network. This is an important 
aspect as the presence of triangles captures the phenomenon of people hav- 
ing mutual friends. The effect of such clustering in random networks in an 
epidemiological setting has been considered, in different models, by Trapman 
(2007) and Britton et al. (2008). 

In the remainder of the paper we firstly describe, in Section 2, the full 
detail of our model. Then in Section 3 we give the ideas behind the above- 
mentioned branching process approximations. In Section 4 we derive explicit 
formulae which allow us to calculate the quantities of interest for two im- 
portant special cases, then give some brief numerical examples in Section 5, 
including demonstrating that our asymptotic results give good approxima- 
tions for even moderately- sized finite populations. In Section 6 we rigorously 
establish the branching process approximations by proving related limit the- 
orems as the population size tends to infinity. The paper concludes with a 
brief discussion in Section 7. 

2 Model 

We consider a closed population of m households, each of n individuals, 
and construct the network of possible global contacts using the 'configu- 
ration model' (as in Durrett (2006, Chapter 3)) as follows. Firstly as- 
sign to each individual a number of half- edges, these numbers being inde- 
pendent realisations of a random variable D (the degree distribution) with 

F(D — k) — k — 0, 1, Conditional on the total number of half-edges 

being even we then pair these half-edges with each other uniformly at ran- 
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dom, whence each such pair of half-edges forms an edge in the (random) 
graph describing the possible global contacts. We denote by ji D and a 2 D the 
mean and variance of the distribution D and assume that both of these quan- 
tities are finite. We also note for later reference that if we follow an edge 
from one vertex to another then the degree distribution of the second vertex 

is the size-biased distribution D, where F(D = k) = kpk/fiD, k = 1,2, 

This is because in the construction of the graph the half-edges are paired 
uniformly at random, so it is k times more likely that following an edge leads 
one to a vertex of degree k than to a vertex of degree 1. By the degree of an 
individual we mean the number of individuals adjacent to it in the network 
of global contacts, not counting those in its own household. 

Note that there may be some imperfections in the graph, in the form of 
parallel edges and self-loops. However, our assumption that a 2 D < oo ensures 
that as m — > oo, the number of these imperfections in the network of global 
contacts converges in distribution to a Poisson random variable whose mean 
is a function of ([i D ,a 2 D ) (Durrett, 2006, Theorem 3.1.2). By treating the 
households as macro-individuals, with degree distribution given by the sum 
of n independent copies of D, it follows that the numbers of parallel edges 
between households and household self-loops also converge in distribution 
to Poisson random variables asm — > oo. Thus the probability that these 
imperfections are absent in the graph is bounded away from zero as m — > oo, 
and consequently (cf. Janson (2009)) our asymptotic results continue to hold 
if the graph is conditioned on having no such imperfections. 

When an infective individual makes infectious contact with a susceptible 
individual, the susceptible becomes infective and remains so for a random 
period of time distributed according to a non-negative real-valued random 
variable J, which we specify by its Laplace transform (f)(9) = E[e _e/ ], 9 > 
0, and call the infectious period. An infective individual makes infectious 
contact with each other member of his/her household at the points of a 
Poisson process with rate A^ and similarly with each individual he is adjacent 
to in the network of global contacts at rate \ G . To be emphatic, both A^ and 
\g are per-pair rates, so an infectious individual of degree k makes infectious 
contacts at overall rate Al(u — 1) + \ck. As usual, all Poisson processes and 
infectious periods are assumed to be mutually independent. 

For ease of presentation, we assume that an epidemic is initiated by a 
single infective individual within the population, either a given specific in- 
dividual or an individual chosen uniformly at random from the population. 
Our assumption that all households are of the same size is also made for ease 
of presentation although, as indicated in Section 7, our results generalise 
easily to incorporate unequal household sizes. 
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3 Heuristics and description of main results 



We now give informal descriptions of the branching process approximations 
we use, firstly to approximate the early stages of an outbreak, leading to a 
threshold parameter and a method of calculating the probability of a major 
outbreak and, secondly, to approximate the expected relative final size of 
(i.e. the proportion of the population infected by) a major outbreak. These 
approximations become exact in the limit as the number of households m — > 
oo, with the household size n held fixed. 

3.1 Forward processes 

The branching process we use to analyse the early stages of the epidemic 
approximates the number of households which become infected in the course 
of the epidemic. Because we are interested only in the final outcome of the 
epidemic and not its precise time evolution we can think of the epidemic as 
evolving in the following way (see, for example, Pellis et al. (2008)). We first 
consider the epidemic spreading only within the household containing the 
initial infective (the local epidemic that it initiates) and then consider the 
number of individuals infected via global infectious contacts made by those 
infected by the local epidemic. Because of the way the network is constructed, 
in the early stages of the epidemic it is highly likely that these globally 
contacted individuals are all in distinct households (this being critical for 
the branching process approximation). We then consider each newly infected 
household in the same manner: local epidemic followed by global infections. 
Again in the early stages it is highly likely that those infected by such global 
infectious contacts are in distinct households and furthermore that they are in 
previously uninfected households. We can view this as a branching process if 
we consider the households infected by a local epidemic initiated by a single 
infective within a typical household to be the children (offspring) of that 
household. 

Note that the offspring distribution of the above branching process is 
different for the initial (i.e. zeroth) generation than for subsequent gener- 
ations, since in subsequent generations the initial infective in a household 
has been infected by one of its global neighbours, so the number of unin- 
fected global neighbours of this individual is equal in distribution to D — 1, 
whilst in the zeroth generation the initial infective is the initial infective in 
the whole population, and the degree distribution of this individual is either 
distributed as D or is a fixed constant, according as the initial infective is 
chosen (uniformly) at random or a specific individual in the population is 
chosen to be the initial infective. We therefore define the random variable 
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C to be the number of global neighbours infected by members of the initial 
infective's household and C to be the number infected by the household of a 
single infective that was infected by a global neighbour. Our branching pro- 
cess approximation is then defined by it having a single ancestor (since the 
epidemic starts with one initial infective) and offspring distribution C in the 
initial generation and C in subsequent generations. Throughout the paper, 
we denote a branching process of this type by BP(1, C, C), or by BP(1, c, c), 
where c = (c , Ci, . . .) and c = (c , c±, . . .) are the mass functions of C and 
C, respectively. 

The above branching process approximation of the epidemic is made fully 
rigorous in Section 6.4.1, where it is shown that, as m — > oo, the total 
number of households infected by the epidemic converges in distribution to 
the total progeny of the branching process (see Theorem 1). Thus, whether or 
not the epidemic can 'take off' and lead to a major outbreak is determined 
by whether or not the branching process is supercritical (i.e. whether or 
not i?* = E[C] > 1). Further, by standard branching process theory, the 
probability of such a major outbreak is given by 1 — fc(&), where a is the 
smallest solution of fg(s) = s in [0, 1], and fc(s) = F,[s c ] and /^(s) = E,[s c ] 
(for s G [0, 1]) denote the probability generating functions (PGFs) of C and 
C, respectively. (Here and henceforth we denote by fx(-) the PGF of the 
random variable X.) Calculation of fc(s) and /^(s) is considered in 
Section 4.1. 

3.2 Backward processes 

We now consider the expected final size of a major outbreak. Again our 
analysis is of the m — > oo limiting epidemic process, for which we find 
the probability that a given individual is infected in the event of a major 
outbreak. By an exchangeability argument this probability is equal to the 
asymptotic mean proportion of the population (individuals, not households) 
that are ultimately infected by a major outbreak. This quantity serves as 
our approximation of the expected proportion infected in a major outbreak 
in a finite population. We determine the probability that a given individual 
is infected by considering its susceptibility set (cf. Ball and Lyne (2001) and 
Ball and Neal (2002)). 

The idea behind susceptibility sets is that for each individual in the pop- 
ulation we can, by sampling from the infectious period distribution and then 
the relevant Poisson processes, make a (random) list of other individuals it 
would infect were it to be infected itself. We then construct a digraph (di- 
rected graph) based on these lists, in which the vertices represent individuals 
in the population and we put a directed arc from i to j when, were i to 
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become infected, it would make infectious contact with j, i.e. if j is in i's list. 
The susceptibility set of individual i consists of those individuals from which 
there exists a path to i in the digraph (including i itself). Note that an indi- 
vidual will become infected by an epidemic if and only if the initial infective is 
in its susceptibility set. We also need the concept of a local susceptibility set, 
constructed in the same way but considering only local (within-household) 
infectious contacts. 

We approximate the size of the susceptibility set of an individual chosen 
uniformly at random from the population by the total progeny of an appro- 
priate branching process. To construct this branching process we break up 
the susceptibility set into 'generations' in much the same way as we look at 
the spread of infection in the early stages of the epidemic. Starting with an 
individual i, consider those individuals j, not in i's household, who are in 
i's susceptibility set by virtue of an arc leading from j to an individual in 
i's local susceptibility set. These individuals are all in different households 
with high probability as m — > oo and the households they are in comprise 
the first 'generation' of the susceptibility set. Repeating this process for each 
of these individuals j (i.e. looking at the individuals who make infectious 
global contact with a member of j's local susceptibility set) gives the second 
'generation'; and by continuing this process we can construct the whole of i's 
susceptibility set. Because each individual j that joins the susceptibility set 
by virtue of a global contact is in a household not previously associated with 
the susceptibility set with high probability, the number of households in each 
generation is approximated well by the branching process BP(1, B,B), where 
B and B denote the offspring random variables for the initial and subsequent 
generations, which again are typically different. 

We show in Section 6.5.2 that, as m — > oo, the conditional probability that 
a typical initial susceptible (i say) is infected, given that a major outbreak 
occurs, is given by the probability that the branching process BP(1, B,B) 
avoids extinction (see Theorem 2). An intuitive explanation of this result 
is as follows. As m — > oo, (i) the number of households in i's susceptibility 
set converges in distribution to the total progeny of BP(l, B, B); and (ii) a 
major outbreak necessarily infects at least logm households (cf. Lemma 6). 
Thus, as m — > oo, the probability that i's susceptibility set intersects one of 
these logm households is if BP(1, B, B) goes extinct and 1 otherwise. The 
latter result follows because if BP(1, B, B) does not go extinct then the size 
of i's susceptibility set is of exact order m as m — > oo. 

The above claim and standard branching process theory imply that the 
expected relative final size of a major outbreak in a large finite population 
is approximately 1 — /#(£), where £ is the smallest solution of f^(s) = s in 
[0, 1]. Calculation of /s(s) and f§(s) is considered in Section 4.2. 
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4 Calculations 



4.1 Forward process 

Consider first the threshold parameter = E[C]. Label the individuals in 
a household 0, 1, . . . , n — 1, with individual the initial infective, and define 
Xi to be the indicator of the event that individual % is infected in the local 
(i.e. single-household) epidemic and C{ to be the number of global neighbours 
with which i makes infectious contact, if i were to become infected. Then 

ra-l 

C = C + Y,XiPi (4-1) 
i=\ 

and it follows, since C\ and Xi are independent and (Ci, Xi), (C2, X2), • • • , (Cn-i, Xn-i) 
are identically distributed, that 

R*=E[C ]+R[T\E[C 1 ], (4.2) 

where T is the final size of the within-household epidemic (not counting the 
initial infective). Denote by ij and Ki the infectious period and number 
of global neighbours, not including its infector, of individual % (this only 
affects the initial infective within the household). Now, since infectious con- 
tacts between different pairs of individuals are independent, Cj | .£Q, ij ~ 
Bm(K h 1 - e~ XGh ). Thus E[Q \ K t , k] = K^l - e"^ 7 -), whence, by the 
independence of Ki and I iy 

E[d] =E[^](1-0(A G )). (4.3) 

Now, for i — 1,2, ... ,n — l, Ki has the same distribution as D, the prescribed 
degree distribution, so E[Kj\ = for such i. However, for the reasons 
noted in the first paragraph of Section 2, since the initial infective in the 
household was infected by a global infection its degree has the size-biased 
distribution D, and because one of these neighbours (the one that infected it) 
has already been infected, K has the same distribution as D — 1, so E[fr ] — 
E[D]-1. It follows from the definition of D that E[D] = E[D]+VaiD/E[D]. 
Substituting these into (4.3) and then (4.2), and letting \it = E[T], yields 

R* = (»D (/iT + 1) + — - (1 - 0(A G )). (4.4) 

The mean \it may be evaluated (typically numerically) by using equations (2.25) 
and (2.26) of Ball (1986), thus enabling R* to be calculated. 
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Calculation of the PGFs fc(s) and fg(s) is more difficult because the 
number of global infections caused by a particular individual is dependent 
on that individual's infectious period, which also influences whether or not 
other individuals in the household become infective and thus the number of 
global contacts they might make. It is possible to use the notion of 'final state 
random variables' introduced by Ball and O'Neill (1999) to find fc and f c , 
but it is not straightforward, so we do not present it here. This methodology 
will be discussed in a forthcoming paper concentrating on the more applied 
aspects of our model. However, there are two special cases where the above 
dependencies do not exist and the analysis is much simpler. These are when 
the infectious period is fixed (i.e. almost surely equal to a given constant) 
and when the infectious period can be only zero or infinity. 

Trapman (2007) describes (using results of Kuulasmaa (1982)) how these 
special cases lead to bounds on quantities of interest for a very general class of 
epidemic models. Trapman's arguments hold for any epidemic model where 
there is only one 'kind' of infectious contact rather than the two (local and 
global) that we are concerned with, but the methods can be easily adapted. 
In addition, a fixed infectious period is often a reasonable assumption to make 
in practice and it is commonly used because it leads to simplifications of the 
kind shown shortly (see, for example, Britton et al. (2007) and Britton et al. 
(2008)). We therefore proceed to calculate the PGFs f c and f c in these two 
special cases as they can be used to calculate the above-mentioned bounds 
and they also may give insight into the importance of and interplay between 
the parameters of our model. The role of the infectious period distribution 
will be discussed in the above-mentioned applied paper. 

4.1.1 Zero or infinite infectious period. 

Suppose that P(J = oo) = 1 — P(J = 0) = p for some p G [0, 1]. For 
the moment we ignore the differences between the initial and subsequent 
generations and denote the generic offspring random variable by unadorned 
C. Here we have 



where C, is the number of global neighbours infected by an infectious in- 
dividual i. Thus Co = K (where = denotes equality in distribution) and 
Ci, C 2 , . . . , C n _i are independent and identically distributed with 




0, with probability 1 — p, 

Co + Y^i=i Ci, with probability p, 




0, with probability 1 — p, 
Ki, with probability p. 
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Also note that the number, N say, of the n — 1 Cj's which take the value 
.fQ (i.e. the number of initially susceptible individuals in the household with 
/ = oo) is binomially distributed, with parameters n — \ and p. We therefore 
have 

fc(s) = E[s c ] = (1 -p)s +pE[s C[,+E -^'] 
= l-p + pE[s Co ]E[s^ Ki ] 

= l-p+pfK (s)f N (f D (s)) 

= l-p + pf Ko ( s ){l-p + pf D {s)T~\ 

where K$ is D or <i in the initial generation and D — 1 in subsequent gener- 
ations (in which case the PGF is f c rather than fc). 

4.1.2 Fixed infectious period. 

Now suppose that P(7 = c) = 1 for some c > 0. Again we temporarily 
ignore the differences between the initial and subsequent generations, la- 
bel the individuals 0, 1, ... ,n — 1 and denote by Cj the number of global 
neighbours infected by an infectious individual i. Then, letting T denote 
the final size of the within-household epidemic, we have C = Co + Y^h=i Ci 
and, conditional on the final size, C±, C2, ■ ■ ■ , Ct are mutually independent. 
Now Ci\Ki ~ B'm(K u 1 - e" cAc ), so f Ci {s) = f Ki (l - p G + sp G ), where 
p G = 1 — e~ cX °. Thus, by the usual formula for the PGF of a random sum, 

fc(s) = fc (s)f T (f Cl (s)) = f Ko (l-PG + sp G )f T (f D (l-p G + sp G )), (4.5) 

where again K$ is D or d in the initial generation and D — 1 in subsequent 
generations. The PGF is easily calculated using Theorem 2.6 of Ball 
(1986). 

4.2 Backward process 

Now consider the branching process approximation of the growth (as de- 
scribed in Section 3.2) of the susceptibility set of an individual, say, chosen 
uniformly at random from the population. The offspring distribution of this 
process has the same distribution as the number of individuals that make 
global contact with the local susceptibility set of a single individual, say in- 
dividual i. Again we have a distinction between the initial and subsequent 
generations but we ignore this for now and denote the random variable of 
interest by B. Firstly we write 

M 

B = -B + ^2 Bji 

3=1 
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where Bj is the number of contacts made with individual j (again labelling 
the individuals within the household 0,1, ... ,n — 1, with corresponding 
to the primary individual i) and M is the size of i's local susceptibility set, 
not counting % itself. (If M — then i's local susceptibility set consists 
of only i itself and the sum is empty.) Now Bj \ Kj ~ Bin(Kj,p G ), where 
Kj is the number of global neighbours of j excluding, in the case of the 
initial individual, the individual it made contact with in order to join the 
susceptibility set and p G = 1 — 4>{Xg) is the probability that an infective 
individual makes infectious contact with a given global neighbour. We do 
not need to condition on the infectious period of individual j because the 
contacts we are considering come from other individuals; the independence 
of the infectious periods of these individuals implies that they make contacts 
with j independently of each other. For a similar reason, Bq, B±, ... , Bm are 
independent. Arguing as in the the derivation of (4.5) yields that 

/b(s) = /x (l - Pg + sPg)/m(/d(1 - Pg + sp G )), (4.6) 

where now K is D in the initial generation (because of how was chosen) 
and D — 1 in subsequent generations. 

In order to determine fu we use equation (3.5) of Ball and Neal (2002), 
which gives a triangular system of linear equations whose solution is the mass 
function of M, from which one can easily calculate the PGF. Note that (4.6) 
holds for any choice of infectious period distribution. It is easily verified that 
in the fixed infectious period case T = M, so = f c ( s ) an d, if the initial 

infective is chosen uniformly at random from the population, /#(s) = fc( s )', 
and in the zero or infinite infectious period case M ~ Bin(n — l,p), where 
p = F(I = oo), whence /m(s) = (1 — P + ps)" 1 ^ 1 . 

5 Numerical results 

We now explore, numerically, some of the features of our model and investi- 
gate how they depend on some of its parameters. As a way of examining how 
the household size n affects the model, Figure 1 shows the critical values of 
the per-pair global contact rate Xg and the per-individual local contact rate 
A^(n — 1), above which the epidemic is supercritical, for several household 
sizes, with the degree distribution and infectious period distribution fixed. 
Note that the expected total rate of global contacts per individual remains 
constant over these plots since D is held fixed. Note also that if Xl = then 
n is immaterial, as is Xl when n = 1. In these situations there is no local 
contact, so we recover the standard network model and the critical value of 
X G is at the point the plotted lines converge to as X L — > 0. The plot reflects 
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Figure 1: Critical values of A^ and A^(n — 1) above which the epidemic 
is supercritical, for n = 2, 3,..., 10 (top to bottom in the plot). Other 
parameters are 1=1 and D ~ Poi(5) (i.e. Poisson with mean 5). 



the fact that, even as the per-individual total contact rate remains constant, 
increasing the household size spreads the potential infectious contacts over a 
larger number of neighbours, thus avoiding repeated contacts with the same 
individual and increasing the spread of the disease. We also observe that, 
fixing D and letting A^ — > oo, the critical value of Ac tends to that for the 
standard network model with the same infectious period distribution and de- 
gree distribution X^=i Di, where the Di are independent copies of D. This 
is because, in this limit, once an individual is infected the whole household 
that it is in necessarily becomes infected, and is easily verified using (4.4). 

Perhaps the most interesting aspect of this model to explore is the de- 
pendence of its behaviour on the distribution of D, the number of global 
neighbours of a typical individual. Considerable research, conjecture and 
discussion has gone into trying to determine distributions which capture the 
features of many real life contact networks — Section III.C of Newman (2003) 
has an extensive list of references. In Figure 2 we investigate the probabil- 
ity of a major outbreak in our epidemic model for various distributions D 
with different properties, in particular different tail behaviours. We use the 
standard Poisson and geometric (with support including 0) distributions, as 
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well as an almost surely constant degree and two variants of heavy-tailed 
distributions. The first has mass function 

\k~\ for k = 1,2,..., h, 
Pk oc < 

\k- a , for k = h + 1, h + 2,..., 

and the second, with mass function p k oc k~ a e~ k / K (k = 1,2,...), is a 
power law with exponential cut-off which has gained much attention in re- 
cent physics literature. We denote these distributions by Pow(fc*,a) and 
PowC(k, a), respectively. The behaviour of these plots for relatively small 




Figure 2: The probability of a major outbreak versus /j, d for different 
classes of degree distribution D. The distribution labelled 'Power law' is 
Pow(fc*, 7/2), for fc* = 5, 6, . . . , 18 and the distribution labelled 'Power cutoff' 
is PowC(k, 3/2), for k G [10,485] (smaller values of k* or k yield subcritical 
epidemics). The other parameters of the model are n = 3,/ = l, A L = 1 and 
\ G = 1/10. 



values of //£> (where the model is close to critical) is largely determined by 
the probability of D taking very large values, i.e. the tail of the distribution, 
as this dictates what opportunity the disease might have to really 'take hold'; 
however when \id is large the behaviour of D at small values is more impor- 
tant, as the epidemic can usually move quite freely and this determines the 
chance that it might be contained by the network structure. 
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We also briefly investigate whether our asymptotic methods give reason- 
able approximations to the quantities of interest in finite populations. We 
estimate the probability and expected relative final size of a major outbreak 
in finite populations from simulations and compare these to the results we 
get from our asymptotic analysis. Each simulation consists of generating 
a random network and running one epidemic on it. Figure 3 shows esti- 
mates of these quantities of interest for increasing numbers m of households 
together with the theoretical (m = oo) values for two choices of degree dis- 
tribution. The estimates of the major outbreak probability are based on 
10,000 simulations for each parameter combination and those that result 
in a major outbreak are then used to estimate the expected relative final 
size. We have plotted point estimates of the quantities of interest, together 
with error bounds based on ±2 standard errors (SE) of the estimator. For 
the probability of a major outbreak, estimated as p, SE = [p(l — p)/^] 1 ^ 2 , 
where no = 10, 000 is the number of simulations. For the relative final size, 
SE = an l 1 ^ 2 , where a 2 is the sample variance of the relative final sizes and 
n\ is the number of simulations that resulted in a major outbreak. 

Note that in small finite populations the determination of a cutoff for 
whether a particular final size constitutes a major outbreak is practically 
impossible; only once the population size is sufficiently large (for m larger 
than about 100 in our simulations) does the distinction become clear. In 
our calculations we have used a cutoff of 0.15 of the population size, this 
being determined by inspecting histograms of the relative final size of the 
simulations. Also note that the vertical scale of plots (a) and (c) is different 
from that of plots (b) and (d). Figure 3 shows that our asymptotic results 
give good approximations for these quantities of interest for populations of 
only a few hundred households. Though the asymptotic values of both the 
major outbreak probability and the expected relative final size seem to con- 
sistently overestimate these values for the finite populations (as one would 
expect since the approximating branching process treats each global infection 
as an infection of a previously uninfected household, thus overestimating dis- 
ease spread), even for populations of only 100 households the relative error 
is much less than 5%. It also seems that having a heavy-tailed degree dis- 
tribution may make the convergence to the asymptotic value a little slower 
(compare plots (b) and (d) at around 200-500 households), but the effect 
seems to be only very slight. Another interesting observation is that the 
relative final size seems to be appreciably more efficiently estimated by our 
simulation methods than the probability of a major outbreak. This is owing 
(at least in part) to the fact that from each simulation we simply observe the 
occurrence or otherwise of a major outbreak — one observation of the forward 
process — whereas when a major outbreak does occur, the proportion infected 
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(a) D ~ Poi(8) 



(b) D ~ Poi(8) 





(c) D ~ Pow(10, 7/2) 



(d) 13 ~ Pow(10, 7/2) 





Figure 3: Comparison of simulation estimates of major outbreak probability 
and expected relative final size for finite populations with asymptotic results. 
The Poisson degree distribution (plots (a) and (b)) has \in = a 2 D = 8 and 
the power law distribution (plots (c) and (d)) has hd ~ 8.04 and a 2 D ~ 96. 
Other parameters are n = 3, I = 1, At = 1 and Xg = 1/10. 
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has information about the susceptibility set of every initial susceptible in the 
population — many (highly correlated) observations of the backward process. 

6 Proofs 
6.1 Overview 

In this section we provide a fully rigorous justification of the results dis- 
cussed in Section 3 concerning the threshold behaviour of the epidemic model 
and its final outcome in the event of a major outbreak. This subsection 
gives a brief outline of our methods of proof. The starting point is a se- 
quence D = (Di,D 2 , ■ ■ ■) of independent copies of D. For m = 1,2,..., 
(Di, D 2 , . . . , D mn ) is used to give the degrees of the mn individuals in a 
population of m households. We then define a realisation of the epidemic, 

say, viewed on a generation basis, and a realisation of an approximating 
branching process, say y( m ) = (Y^ m \ k = 0, 1, . . .) (see Section 6.2). In 
the network is formed, i.e. the half-edges are paired up, as the epidemic pro- 
gresses. The branching process is similar to the branching process, Y 
say, described in Section 3.1, except the empirical distribution of the degrees 
Di, D 2 , . . . , D mn is used in place of the degree distribution D. The epidemic 

and approximating branching process are coupled so that they 

coincide until a random number, + 1, of households have been infected 
in £ (m) . It is shown that P(r (m) > k) ->■ 1 as m ->■ oo for all k e Z+, so 
2(m) ^ foe number of households infected in E ^ , and Y ( m ) , the total progeny 
of Y^ m \ have the same limiting distribution as m — > oo. (We use Z + to de- 
note the positive integers including and N to denote the strictly positive 
integers.) Now, y( m ) converges in distribution to Y as m — > oo, so is 
asymptotically distributed as Y, the total progeny of Y (see Theorem 1), 
thus providing a formal justification of the threshold behaviour described in 
Section 3.1. 

Suppose now that R* > 1, so that major outbreaks are possible. Let t m = 
[2 log log mj log J , where, for i6K, [^J denotes the greatest integer < x. 
We show (cf. Lemma 7) that there exists f3 > 1 such that lmv^oo P(logm < 
< (logm)' 3 ) = P(y = oo), where Z^ is the number of infectious 
households in generation t m of E^ m \ It follows that, with probability tending 
to 1 as m — > oo, a major outbreak has at least logm and at most (login) 13 
infectious households in generation t m . 

We next consider the probability that a typical individual, i* say, that 
is susceptible at time t m in E^ ultimately becomes infected. We do this 
by stopping the construction of E^ at time t m , leaving the Z^ infectious 
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('live') half-edges unconnected, and constructing the susceptibility set, 
say, of i* in 'generations' as described in Section 3.2, pairing up the half-edges 
as we construct the susceptibility set. If at any point in the construction of 
iS^" 1 ) a half-edge is paired up with one of the z}™^ live half-edges from the 
epidemic then i* is ultimately infected, otherwise %* is not infected by the 
epidemic. Note that for any individual, % say, in we need to explore 
all of i's global neighbours (and not just those that join S^), since if any 
half-edge emanating from % is paired with one of the live half-edges then 
i* is ultimately infected. Thus we need to construct simultaneously A^ m \ the 
set of global neighbours of S^ m \ also on a generation basis. 

Let (S^ m \A^) = ((^ m) ,4 m) ), k = 0,1,...) describe the number of 
households in successive generations of (S^ m \ A^). In Section 6.2, we con- 
struct realisations of (S^ m \ A^) and an approximating two-type branching 
process (X( m ),X^ m) ) = ((X { ™\x { ™ ] ), k = 0, 1, . . .). The process X (m ^ is a 
single-type branching process that is similar to the branching process, X say, 
described in Section 3.2, except, as with Y^ m \ the empirical distribution of 
D 1: D 2 , . . . , D mn is used instead of the degree distribution D. The process 
corresponds to global neighbours of who are not in iS*™-* ; individu- 
als in X^ have no offspring. The processes (S^ m \ A^>) and (X^ m \ X^) are 
coupled so that they coincide until f^ + l households have joined 
where P(f ^ > k) -> 1 as m -> oo for all k G Z+. Let W im) and de- 
note the number of households in and A^ m \ respectively, and let X^ 
and X denote the total progenies of X^ and X, respectively. As with E^ m \ 
and X^ have the same limiting distribution as m — > oo, which, since 
X( m ) converges in distribution to X as m — > oo, is given by the distribution 
of X. Now, for any k G N, if W (m) + W { ™ ] < k then the probability that <S (m) 
intersects with one of the Z^ live half-edges tends to as m — > oo (since a 
major outbreak has at most (logm) 13 infectious households at generation t m 
of the forward process), so the limiting (as m — > oo) probability that %* is 
ultimately infected by a major outbreak is at most P(X = oo). (Note that 
(X( m \ X^) goes extinct if and only if X^ goes extinct.) 

We also construct, for all sufficiently small e G (0, 1), a branching process 
£ X^ m \ which is a lower bound for as long as < em; whence 

F(W (m) > em) > P( e X( m ) = oo), where £ X^ denotes the total progeny of 
f X' m '. As m — > oo, converges in distribution to £ X, the total progeny 

of a branching process £ X say. Moreover, for any e > 0, if > em the 

probability that intersects one of the Z^ live half-edges tends to 1 as 
m — > oo (since a major outbreak has at least logm infectious households 
at generation t m of the forward process), so the limiting probability that i* 
is ultimately infected is at least P( e X = oo). Furthermore, P( £ A = oo) — >■ 
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F(X = oo) as e I 0, which, combined with the result described at the end 
of the previous paragraph, shows that the probability that i* is ultimately 
infected by a major outbreak tends to P(X = oo) as m — > oo (see Theorem 2). 
It follows that the expected proportion of the population that are infected 
by a major outbreak also tends to P(X = oo) as m — > oo (see Corollary 2). 

Our results are proved by conditioning on the degree sequence D and 
showing that they hold for P-almost all D. The unconditional results then 
follow using the dominated convergence theorem. As remarked above, the 
network of global contacts is now constructed as the epidemic/susceptibility 
set evolves, not a priori as in our model description in Section 2. This 
implicitly means that, rather than conditioning on the total number of half- 
edges Y1T=\ Di being even, we simply ignore the single left-over half-edge 
in the event of Yli^i D% being odd. This small change does not affect the 
asymptotic results as m — > oo (cf. van der Hofstad et al. (2007, Section 1.1)). 

The remainder of this section is organised as follows. The main con- 
structions are described in Section 6.2, with the epidemics and their 
approximating branching processes being described in Section 6.2.1 and the 
susceptibility set processes and their approximating branching processes be- 
ing described in Section 6.2.2. Some notation concerning the offspring distri- 
butions of various branching processes is given in Section 6.2.3. Section 6.3 
contains some preliminary results, and the main results are given in Sec- 
tions 6.4 and 6.5, which analyse the epidemics and the susceptibility 
set processes (S^- m \ A^), respectively. 

6.2 Construction of approximating branching processes 

Let (Qi, .Fi,Pi) be a probability space, on which is defined a sequence D = 
(Di, Z?2, • • •) of independent random variables, each distributed according to 
the degree distribution D. Also let (0 2 , ^2,^2) be a probability space, on 
which are defined the following mutually independent random quantities: 

(i) for every (d,j) = ((di, d,2, ■ ■ ■ , d n ),j) G Z™ x {l,2,...,n}, a sequence 
of random variables $^ d ' J, ' ) , . . ., which are independent copies of 
the random variable $( d ' J ) defined below. 

(ii) for every (d,j) G x {l,2,...,n}, a sequence of random variables 

^aI^), (^> 2 d ' j \ ^^2^)1 ■ ■ -i which are independent copies of the 
random variable (\J/( dj ), \J>^'-^) also defined below. 

We also require other random variables defined on (f2 2 , J-2, P2), but these are 
described only informally because the detail is unnecessary for our proofs. 
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The random variable $( dj ) describes the number of global neighbours 
with which infectious contact is made by members of a household of indi- 
viduals with degrees given by d = [d\,d 2 , . . . ,d n ) in which individual j is 
initially infected and is defined as follows. Let G be the random directed 
graph on the vertices V — {1, 2, . . . , n} obtained as follows. For each vertex 
i we take an independent realisation, Li say, of the infectious period distribu- 
tion / and then put an arc from % to each other vertex in V independently 
with probability 1 — e~ Ai/ \ Given G, let C\, C 2 , ■ ■ ■ , C n be independent ran- 
dom variables with Cj 1 I\, I 2 , ■ ■ ■ , I n ~ Bin(c^, 1 — e XaIi ), where d\ = di if 
i 7^ j and d'- = dj — 1. Then = ^™=i where j ~> % denotes the 

event that there is a path from vertex j to vertex % in G (with the convention 
that i ~> i). 

In a similar manner, the two components of the random variable (^!^ d ^\ ^ <y ^) 
describe the number of global neighbours of the local susceptibility set of indi- 
vidual j in a household of individuals with degrees given by d = (di, d 2 , . . . , d n ) 
that do and do not make global infectious contact with their neighbour in that 
susceptibility set. To this end, let G be the random graph described above 
and, conditional on G, let Bi, B 2 , . . . , B n be independent random variables 
with Bi ~ Bin(^,p G ), where d[, d 2 , . . . , d' n are as above and pc = 1 — 0(Ag). 
We then have ^f 3) ) = ^ti M^}{B U d\ - B { ). 

We now introduce some further notation. For k = 1,2,..., let D k = 
(D kl , D k2 , D kn ), where, for i = 1, 2, . . . , n, D ki = D^ k _ 1)n+i is the degree 
of the ith individual in the fcth household. Let H k = YH=i Dm denote the 
total degree of the fcth household. Lastly, denote by = — Yl'iLi Hi 

the (empirical) mean number of edges emanating from each of the first m 
households. 

The epidemic, susceptibility sets and approximating branching processes 
are defined on the probability space (fi, P) = (Qi, J^jPi) x (Q 2 , F 2 , P 2 ). 
Our construction and most of our calculations will henceforth be conditional 
on the degree sequence D. To this end, we denote P(- | D) by Po(-) and 
similarly E[- | D\ by E^f-]. Conditional on this degree sequence and for ev- 
ery m = 1,2, . . ., we now describe the construction of a branching process, 
Y( m \ which approximates the early stages of the spread of the epidemic 
amongst households 1,2, ... ,m; then another (two- type) branching process, 
(X^ m \ X^), which approximates the 'early growth' of the susceptibility set 
(and its global neighbours) of a typical initially susceptible individual in that 
population. 
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6.2.1 The forward processes. 

We first describe the branching process Y^ m \ Set Yq 7 ^ = 1 and choose an 
individual uniformly at random from 1,2, . . . ,mn. Suppose it is individual 
i e {1, 2, . . . , n} of household A e {1,2,..., m}. Then y/ m) = $[ D *o+ e -^ 
where e« is the unit n-vector with a 1 in the ith position. For subsequent 

generations k > 2, we continue the construction as follows. For each j = 

( \ — ( \ 

1,2,..., F fc _j , sample a half-edge uniformly at random from the m/% half- 
edges in the population and, supposing it emanates from individual i of 
household A, set = where z/(A, t) is the number of times we 

have sampled previously from the sequence ^[ Da ' l \ &[ l>a ' i '\ Lastly, set 

1 k-1 

y(H _ y( m ) 
/s / j kj 

3=1 

The branching process y( m ) and the epidemic process can be cou- 
pled by using the same D, <3>'s and uniformly random samples. However, 
the coupling breaks down as soon as a half-edge is sampled that emanates 
from a household that either has been used previously in the epidemic or 
is a neighbour of such a previously used household. If a previously used 
half-edge is sampled then in another half-edge needs to be sampled. 

If an unused half-edge that emanates from a previously used household is 
sampled then in E^ the spread of the epidemic within that household is 
different from in y( m ) since there are fewer susceptibles. Finally, if a half- 
edge emanating from a household neighbouring a household previously used 
in E^ is sampled then the spread of the epidemic from that household is in 
general different from that in Y^ m \ since the (effective) degree distribution 
of individuals in that household may be different from that assumed in y( m ). 
(When constructing E^ one needs also to pair up non-infectious half-edges 
from infectious individuals.) In all of these cases the construction of E^ can 
be continued appropriately but the detail is not important for our purposes. 
However, we do need a bound on the size of, and number of half-edges that 
emanate from, the 'bad set' of households that must be avoided in order that 
Y"( m ) and E^ remain coupled. To that end we describe another branching 
process = (T^ m \ k — 1, 2, . . .), which provides such a bound. 

Let Tq" 1 '' be the total degree of the initial household in Y^ m \ so = 
Hj\ , where A is as above. For A; = 1,2,..., is determined as follows. 
For each j — 1,2, ... , T^\, a half-edge is sampled uniformly at random from 
the m/i^ half-edges in the population, say this half-edge emanates from 

household Aj, and then put T^f = H A . - 1. Finally, set T fc (m) = Y^ k =i T kf '• 
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The processes Y^ m \ and can be coupled in an obvious fashion 

so that their sampled half-edges correspond. Let = Xw=o^f"^ = 

0, 1, . . .) be the total progeny of up to generation k. Then 2T^?\ provides 
an upper bound for the number of half-edges that emanate from (and hence 
also for the size) of the bad set of households in generation k of E^ . The 
index k + 1 arises because the bad set consists of not just all households 
infected up to generation k of E^ m "> but also their neighbouring households. 
The factor 2 arises because does not count the receiving half-edge when 
the half-edges are paired up. 

The above construction of y( m ) (and implicitly E^) is continued for a 
fixed number of generations, t m , and is continued for t m + l generations. 
(Of course, some or all of these processes may die out beforehand.) 

6.2.2 The backward processes. 

The two- type branching process (X^ m \X A m ^) is defined analogously to 
except the random variables (^ [ d ' j \ ^^i^) are use d instead of tyf 1 '^ (recall 
that there are no offspring in X A m ^). The process X^ approximates the 
growth, described by generations as in Section 3.2, of the susceptibility set of 
an individual chosen uniformly at random from all susceptible individuals at 
time t m in the epidemic process E^ and X A m ^ approximates the number of 
global neighbours of this susceptibility set, also on a generation basis. The 
processes (X (m \X A m) ) and (S^ m \A^) can be coupled in a similar fashion 
to that used for y( m ) and E^ m \ though note that now the coupling breaks 
down if a sampled half-edge emanates from either (i) a household previously 
used in the susceptibility set, (ii) a household neighbouring such a household, 
or (iii) a household or neighbour of a household used in the forward process 
up to time t m . Also note that this coupling may break down at generation 
(if the initial individual is in a household that is either infected in E^ or 
a neighbour of a household infected in E^). As with the epidemic process 
E( m \ the construction of (S^ m \A^) can be continued appropriately after 
the coupling breaks down but we do not require such detail. 

6.2.3 Further notation and limiting processes. 



For m = 1,2,.., let = (c ( ™\ , ■ ■ ■) and = , c ( ™ } , . . .) de- 



note the offspring distributions of the initial individual and all subsequent 
individuals, respectively, in Y^ m \ For d = (di, d 2 , • • • , d n ) G Z™ , let 




and 




i=i 
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where \d\ = Y^j=i^j- Then the 'household type' (i.e. the degrees of indi- 
viduals within the household) of the initial individual in is distributed 
according to p^ (d G Z") and the household type of any subsequent individ- 
ual is distributed according to p^ (d G Z"). It follows that, for k — 0, 1, . . ., 

4 m) = E = k ) and ^ = E = fc )> 

where $d and i>d are random variables with distributions given by 
1 ™ 

P($ d = fc) = _^>($( d + e ^) = k) (Jfe = 0,l,...,|d|), (6.2) 

i=l 

and 

P(l > d = k) = E "T7| P($ (d,i) = fc) (jfe = 0,l,...,|d|-l). (6.3) 
i=i 1 1 

For m — 1,2,..., the offspring distributions of the initial and subsequent 

individuals in X^ m \ and 6*" \ are defined analagously to and c*-" 1 -*, 
using (6.1)-(6.3) with $ replaced by \17 and $ by ^ throughout. Replacing 
$ by (\&, ^) and $ by (\&, ^) throughout gives the offspring distributions 
associated with the two-type process 

Further, for m — 1, 2, . . ., let r( m ) = (r^, rj" 1 **, . . .) denote the distribu- 
tion of the number of initial ancestors and = (f^ , f[ m \ . . .) denote the 
offspring distribution of both the ancestors and any subsequent individuals 
inT (m). Then ^ f or A; = 0, 1, . . ., 

r ( k m) = E ^ and = E ^ 

{d&l : \d\=k} {dell : |d|=fc+l} 

For m — 1, 2, . . ., let / ai m ' ) = X)fcli ^ c i"^ be the mean of the empirical distri- 
bution c*™) , and define jl^ , /i^" 1 - 1 , , /Zr" 1 ' 1 and /ir™''' analogously. 

In Section 6.3 we prove that the offspring distributions of Y^ m \ and 
T^ 7 ™) and the distribution of the number of ancestors in converge almost 
surely as m — > oo to those of branching processes we denote by Y, X and T 
respectively. To that end, for d G Z", let pd = Yii=i Pdi and pd = Pd\d\ /n/iD- 
(Recall that p k = F(D = k) (k — 0, 1, . . .) and /j, d = Y^k=i ^Pk ) Also, for 
k = 0,1,..., let 

Pn(k) = E P d = P ( Dl + D 2 + --- + D n = k)= P(#! = fc) 

{dez™ : |d|=fe} 
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and, for k — 1,2, . . ., let pH(k) = kpn(k) /n/iD- Now, for k — 0, 1, . . ., let Ck 
be defined analogously to but with p^ replaced by pd, and define Ck, bk 
and bk similarly. Also, for k — 0, 1, . . ., let rfc = pn(k) and = p#(A: + 1). 
Let c = (c , Ci, . . .) and define c, 6, b, r and r similarly. Let ji c = YlT=i ^ Ck 
and define jl c , fib, fib, fi r and fi r in the obvious fashion. 

Let Y = (Y ,Y U ...), X = (X ,X U ...) and T = (T ,T U ...) be the 
branching processes BP(1, c,c), BP(1,6, b) and, in an obvious notation, 
BP(r, f, f), respectively. Note that the branching processes Y and X are 
those described in Sections 3.1 and 3.2, respectively. Note especially that, 
in the notation of Section 3.1, this implies that jx c = -R*. We also require a 
two-type branching process (X, A), defined analagously to (X^ m \ X^) but 
again using pd and pd in defining the offspring distribution instead of the 
empirical versions p^ and p^ . 



6.3 Preliminary results 



In this section we collect some results required in the analysis of the forward 
and backward processes. Recall that we have made the assumption that 
erf, = VarD is finite (although some results only require /Id < oo). 

Lemma 1. There exists A\ G T\, u)ith^\^A\) = 1, such that, for alloui G A\, 
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TO— S-OO 


M 


(Hi) 
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TO— S-OO 
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lim b^ 

TO— S-OO 
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(iv) 


lim ^f> 

Til — S-OO 
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771 — S-OO 
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c, lim c (m) (a;i) = c, lim 6 (m) (u;i) = b, 

TO— S-OO TO— S-OO 

b, lim r (m) (wi) = r and lim r (m) (u;i) = r; 



(wi) = /i C7 lim ^ m) (^i) = /i C7 lim /4, m) (wi) = 

m— s-oo m— s-oo 

(wi) = jib, lim //^(wi) = /i r and lim /4 m )(wi) = /v 



Here and henceforth, convergence of a sequence of sequences is interpreted 



elementwise, so, for example, linim^oo c 
for each k — 0, 1, 



(m) 



c means that lin^^oo c fc 



(m) _ 



Proof. By the strong law of large numbers, there exists A 2 G T\ with Px (A 2 ) = 
1 such that lim jj!^\ui) = n/jp (o>i G A 2 ) and, for each d G Z™, there ex- 



ists ^ G J 7 ! with Pi (Ad) = 1 such that lim p^iui) = pd (ooi G Ad). Let 
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A 3 = A 2 fl fldez™ Ad- Then Pi (A3) = 1 and it is easily verified that (i) and 
(ii) hold for all u 1 G A> whence (iii) also holds for all ui G A 3 by Scheffe's 
theorem (see, for example, Billingsley (1968, p. 224)). Next, consider 



5>l m) = E^E^ m)p (^ = A; ) 



k=i 



k=i 

oo 



\d 



E k E -Hn) E 1 ^=<*> p (*«* = fc ) 



(m) 

fc=i dez" m /% i=i 



1 1 rrt oo 

/% i=i fc=i 



Now, P($ jDi = ifc) = for k > \Di\ - 1, so 



E 



\D 1 \Y,kF{$ Dl = k) 



k=i 



^ (\ D i\ - !)] < °°> 



as erf) < oo. Thus, by the strong law of large numbers, there exists At G T\ 



with Pi (A) = 1 such that, for all iO\ G A, 

m oo 

lim -V|AMlE A;P (^(«i) = A; ) 



E 



i=i 



fc=i 



iDil^fcP^ = k) 



k=l 



k=l 



It follows that limm^oo jfi™\<jj-\) = jl c for all Ui G An A- Similar arguments 

□ 



hold for the other means in (iv) and the lemma is thus proved. 



Remark. Throughout the remainder of the paper, A refers to a set that 
satisfies the statement of Lemma 1. 

The following result concerns the convergence of certain quantities associ- 
ated with a sequence of branching processes when their offspring distributions 
converge in distribution. 

Lemma 2. Suppose that a, a, and (m = 1,2, . . .) are probability 
distributions satisfying — > a and — >■ a as m — >■ oo. Let y( m ) ~ 
BP(l,a( m \a (m) ) (m = 1,2,...) and Y ~ BP(l,a,a). T/ien, denoting by 
y( m ) (respectively Y ) the total progeny ofY^ (respectively Y ) , 
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(i) lim P(F (m) — k) — F(Y = k) (k = l,2,.. .); 

m— ¥00 



(ii) lim P(f( m ) = oo) = F(Y = oo), provided a 1 ^ 1. 

Proof. Part (i) follows immediately by considering the sum of the probabili- 
ties of the finite number of sample paths of y( m ) with y( m ) = k. Part (ii) is 
a simple extension of Lemma 4.1 of Britton et al. (2007). □ 

Remarks. 

1. The condition in part (ii) of the lemma is in practice only a technical 
condition which will always hold true. As pointed out by Britton et 
al. (2007), although the case a\ — 1 really can be an exception (for 
example if = 1 — = 1/m), such a scenario is, from an applied 
viewpoint, decidedly pathological. 

2. We sometimes use a slight variant of Lemma 2, where the branching 
processes are indexed by e G (0, 1) and their offspring distributions 
converge as e 1 0. Of course, the analogous results hold, and the proof 
is exactly the same. 

Lastly, we have a result concerning the probability of picking a 'bad' 
half-edge in our constructions of the forward and backward processes. 

Lemma 3. Suppose that, for each m = 1, 2, . . we draw elements uniformly 
at random, with replacement, from the set = {1,2,..., m/i^}. Suppose 
also that, for each m, there is an increasing sequence of (random) sets C 
3^ C • • • C 3^> and at the ith pick we wish to avoid picking a member of 
3^ m \ Denote the ith pick by xf 1 ^ and let = min{i : xf 1 ^ G — 1 

be the number of picks we make before making a pick from a set we wish 
to avoid. Suppose further that there exist strictly positive integers g{m) and 
h{m) (m = 1, 2, . . .) satisfying lim m _ > . 00 g(m)/i(m)m~ 1 = and 

JiP%)(fi ^ %0) = 1 (6-4) 
for all u 1 G A 1; where J- m ^ = \3} m ^\. Then, for all u 1 G A 1; 

lim P r ,( Wl) (r( m ) > g(m)) = 1. (6.5) 
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Proof. In view of (6.4), for ui e A 1 , 
liminfP^)^ >g{m)) 

= liminfP^) (r<™> > g(m) \ < h(mj) ¥ D{ui) (j^ < h{mj) 



N g(m) 

> liminf 1 - 



m— too 



> 
= 1, 



using (6.4), Lemma l(i) and the fact that g(m)h(m)m 1 — > as m — > oo. 
The assertion (6.5) then follows. □ 

6.4 Analysis of forward process 

6.4.1 Threshold theorem for the epidemic E^ m \ 

In order to prove a threshold theorem for the epidemic we first establish a 
bound for the size of the bad set of half-edges after k generations of the 
epidemic E^ m \ Recall (from the discussion at the end of Section 6.2.1) that 
the number of half-edges in this set is bounded by 2T^\. 

Lemma 4. For all ui E A 1} 

lim P jD(wi) (fi m) >logm) = (A; = 1,2,...). 

Proof. Fix wi E A x . Then note that E D(wi) [f fc (m) ] = /i^ m) (wi){l + ^ m) (wi) + 
(4 m) ( Wl )) 2 + • • • + {fL ( r\uJi)) k } and also that ^Vi) < rf m Vi) + 1. Thus 
E D(wi) [f fc M ] < (Jfe + l)(^ m) (wi) + l) fe+1 and, by Markov's inequality, 

*nw {Tl m) M > logm) < ^(M m) M + l)** 1 . 

The lemma now follows, since fii m \uji) — > fi r as m — > oo for all Wi G Ai, by 
Lemma l(iv). □ 

For m — 1, 2, . . ., let Z*" 1 ) denote the total number of households infected 
in the epidemic E^ m \ including the initial household, and let y( m ) and Y be 
the total progeny, including the initial individual, of the branching processes 
y( m ) and Y, respectively. We now show that the total number of households 
infected in E^ converges in distribution to the total progeny of Y. 
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Theorem 1. For k = 1, 2, . . ., 

(i) for all U! G A u lim P r>K) (Z (m) = k) = P(F = fc); 

m^oo 

(^j lim P(Z (m) = fc) = P(y = k). 

m— >oo 

Proof. Fix wi G Ai and let r*™-' be the number of households infected by 
fi{m) k e f ore a ^aoi half-edge is chosen. Fix k G N. Then 

P D K)(^ (m) = *0 = ^(.o(2 (m) = k, < k)+F D(uJl) (Z^ — k , r< ro > > k). 

(6.6) 

Let J"/ m) (/ = 1, 2, . . .) be the set of half-ed ges we wish to avoid when choosing; 
the /th household to spread the epidemic to. Then jj^ = \J'^\ < 2T^ m ' 1 , 
so 

PD( Wl )(4 m) < 21 °S m ) > p o(-i)(^ (m) < logm) ->• 1 

as m — > oo, by Lemma 4. Thus, using Lemma 3 with g(m) = k and h(m) = 
21ogm, lim m ^ 00 P jD(wi) (r( m ) > k) = 1. Therefore, linw^ F D(uJl) (Z^ = 
k , r( m ) < k) = and, recalling (6.6), 

lim P jD(wi) (Z( m ) = k) = lim P jDK) (Z( m ) = k , r< m > > fc) 

= lim P D(u;i) (y( m ) = A;,r^ > A;) 

= lim P£>( Wl )(y (m) = fc) 
= ¥{Y = k), 

using Lemmas l(iii) and 2(i), proving assertion (i). Further, 

lim P(Z (m) — k) — lim E \F D {Z (m) = k)] = F(Y = k), 
using the dominated convergence theorem, proving assertion (ii). □ 



6.4.2 Early behaviour of major outbreaks. 

Theorem 1 shows that the total number of households infected in con- 
verges in distribution as m — > oo to the total progeny of Y, so if jl c < 1 only 
minor outbreaks can occur in the limit as m — > oo (recall that = jl c ). 
We now assume that jx c > 1 and study the early behaviour of when a 
major outbreak occurs. For m = 1, 2, . . ., let 

t m = L21oglogm/log/i c J . 
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We obtain a bound on the size of the 'bad set' of half-edges at time t rn and 
show that, with probability tending to 1 as m — > oo, in a major outbreak 
there are at least logm infected households after t m generations of the epi- 
demic process. 

Lemma 5. There exists (3 G (1, oo) such that, for all u>i G A\, 

lim ¥ D(ull) (f^l>(\ogmf) = 0. 

to— »oo 

Proof. Fix cdi G Ai and note that, for all sufficiently large m, 

E D (»)[j; ( r«] = 4 m) M(i + A< m, K) + • • • + (ri m, (^i))'™ +i ) 
< ^» ( „)(f/-))" +2 . 

/i^ '(Wi) - 1 

Thus, by Markov's inequality, for such m, 

PWfl-J, > <W> < (^'W^ ,f W . (6.7) 
( ljV tm+ i _ _ ( logm)/ 3 ^M^) _ ! 

It is readily shown, by considering its logarithm and using Lemma l(iv), 
that, for all sufficiently large /3, the right hand side of (6.7) tends to as 
m — )■ oo, and the lemma follows. □ 

Lemma 6. For all ui G A x , 

lim P D{m) (Y t W > logm) = P(Y = oo). 

TO— »00 

Proof. Note that either (i) c and c both have infinite support, or (ii) c and 
c are supported on {0,1,..., nd mSLX } and {0,1,..., n<i max — 1} (or subsets 
thereof) respectively, where rf max = max{/c : pk > 0}. 

Consider (i) first. For sufficiently small e > 0, let ko = min{/c : YlZk+i c « < 
£ }, £ ' = E^o+iC = min{/c : E^ fc +i ^ < 4 and e 7 = YZh+i^h 
so e' < £ and e' < e. Note that ko and A; are well-defined whenever 
e < 1 — (c V Co) (where a \/ b — max(a, &)), and also that both ko and 
k tend to oo as e I 0. Now let Y £ = (Y k £ , k = 0, 1, . . .) ~ BP(1, c £ , cf), 
where c £ has elements cf = q + -^q-j- for i = 0, 1, . . . , A; and c| = for 2 > ko, 
and c e = (cf , % = 0, 1, . . .) is defined similarly but with q, e' and ko replaced 
by Ci,e' and k , respectively. Also let fj, £ = Ylh=i an d P>e — YlT=i ^1- 

Now, note that Yli=o c i < Si=o c i = 0, 1, . . .), so /i e < /i c . We also 
have jj, £ = Yli=i i c i > Yli=i ~ > as e I 0, so /i e — )■ /x c as e I 0. Similarly, 
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p, £ — > fl c as e I 0. Now fix u\ G A\. Then, by Lemma l(iii), c^ m \ui) — > c 
and c^iui) — > c as m — > oo, so there exists M(e,u;i) such that, for all 
m > M(e,ui), c' m '(wi) < cf, for i = l,2,...,k , and 0^(001) < cf, for 
i = 1, 2, . . . , k . Thus, for m > M(e, ooi) and k — 0, 1, . . ., Ei=o ^"^(^i) < 
ELo c ! and E?=o^ (m) ( w i) < E?=o 5 f ( note > for example, that ELo c ! = 1 

St St 

if k > ho), whence Y^ m >{uj\) > Y 6 , where > denotes stochastic ordering. 
Therefore, for u> 1 G A 1 and m > M(e,Ui), 

P D K)(^L" > log"*) > POZ > Iog"») = P ( -^k—i > -^l) • (6-8) 

Now, note that Ei^i^logi < 00 (the summand is for i > k ), so 
by the well known result concerning the exponential growth of branching 
processes (see, for example, Haccou et al. (2005, Theorem 6.1)), there exists 
a random variable W e , which takes the value if and only if Y £ goes extinct 
(i.e. if and only if Y £ = Ei^o^T < °°)> sucn that 

Yf 

^ W £ as m ->• 00. 



Next, since t m = [2 loglogm/ log/i c J , observe that, for suitable m G [0, 1), 

logm / O log/i e \ . , . # e _ . _ 

lo S — ^7— — r = 1-2- — loglogm + log— + # m log/i £ . 



Af e /2*™ 1 V log AW M 

Recalling that /2 £ — )■ /i c as e — )• we see that, for sufficiently small e, 
log // £ / log p, c > 1/2 and thus log m/(fx £ jl t £ m ^ 1 ) — > as m — > 00. It then 
follows from (6.8) that, for such e, 

liminf P £>(wi) (y/ m) > logm) > P(W £ > 0) = P(F £ = 00). (6.9) 

TO— S-OO m 

Now, c e — >■ c and c e — > c as e I 0, so letting e | in (6.9) and using 
Lemma 2(ii) yields 

liminf F D(ull) (Y t (m) > logm) > F(Y = 00). (6.10) 

TO— S-OO m 

Now, for k = 1,2, 

lim sup F fl(wi) (F/"° > log m) < lim sup Pr, (wi) (F (m) > log m) 

m— s-oo m^oo 

< limsupPr,( Wl )(F (m) > k) 

m— s-oo 

= p(y > fc), 
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using Lemmas l(iii) and 2(i). Letting k — > oo then yields 
lim sup P^^ > logm) < P(Y = oo), 

which, together with (6.10), establishes the lemma. 

In case (ii), a suitable lower bounding branching process is obtained by 
setting, for e < c ndmax Ac ndmax _i (where aAb = min(a,6)), cf = Ci + e/(nd max ) 
(i = 0,1,..., nd nmx - 1), < dmax = c dmax - e, cf = q + e/ (nrf max - 1) (i = 
0, 1, . . . , nrf max - 2), c^ dmax _ 1 = c n(imax _! - e, and (6.10) follows as above. □ 

For m — 1, 2, . . . and fc = 0, 1, . . ., let denote the number of infectious 
households in generation fc of E^ m \ 

Lemma 7. Lei /3 be as in Lemma 5. Then, for all ui\ G A\, 

lim P D(W1) > logm , f, ( rli < (logm)' 3 ) = P(Y = oo). (6.11) 

Proof. Lemmas 5 and 6 show that (6.11) holds with replaced by Y}™\ 
Application of Lemma 3, with g(m) = (logm) 13 and h(m) = 2(logm)^ then 
shows that linim^oo Fd^(Z^ = Y^) = 1 and the assertion follows. □ 

Corollary 1. (%) For all u x G A 1; lim F D(ull) (Z {m) > logm) = P(Y = 

oo); 

(^j lim P(Z (m) > logm) = P(Y = oo). 

m— s-oo 

Proof. Fix wi G Ai. For k = 1, 2, . . ., 

limsupP jD(wi) (Z( m ) > logm) < limsupP jD(wi) (Z( m ) > fc) 

m— >oo m— s-oo 

= P(y > fc) (using Theorem l(i)), 

and letting — >■ oo yields 

limsupP jD(wi) (Z (m) > logm) < P(Y = oo). 

m— s-oo 

Also, 

lim inf F D(uil) (Z (m) > log m) > lim inf F D[ui) (z\ m) > log m) 

m— s-oo m— >oo m 

= P(Y = oo) (using Lemmas 5 and 7), 

and assertion (i) follows. Assertion (ii) then follows using the dominated 
convergence theorem, as in the proof of Theorem 1(h). □ 
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Note that Theorem 1 and Corollary 1 imply that if (h m ) is any sequence of 
real numbers satisfying h m — > oo as m — > oo and h m < logm for all m, then 
lim^^ F D(uJl) (Z^ G [h m , logm)) = for all Wl G A 1 and lim^^ F(Z^ G 
[h m ,\ogm)) = 0. Thus, for m = 1,2,..., it is natural to define a major 
outbreak as one which infects at least logm households, i.e. as one in which 
the event G (m) = {co G Q : Z (m \co) > logm} occurs. Let G^ = {co G Q : 
^ ) > log m , 3^ m +i < (logm)' 3 }, where /3 is as in Lemma 5. Clearly G^ C 
G( m ), and Lemma 5 and Corollary 1 imply ]im mr + 00 F D(uJl) (G^ \ G {m) ) = 
for all uji G A\ and linim^^ P(G^ \ G^) = 0, so we can take G^ as our 
working definition of a major outbreak. 

6.5 Analysis of backward process 
6.5.1 Lower bounding branching processes. 

We now analyse the 'backward' process, which describes the generation-wise 
growth of the susceptibility set (and its neighbours) of a typical individual 
that is susceptible at time t m in the forward process, in order to find the 
asymptotic probability that such an individual is ultimately infected, given 
that a major outbreak occurs (i.e. Z^ > logm and T^ +l < (logm) 13 , where 
(3 is as in Lemma 5). To this end, it is fruitful to have, for all sufficiently 
small e > 0, a branching process £ X^ which asymptotically bounds 
from below until the susceptibility set covers a proportion e of the households 
in the population (cf. Whittle (1955)). In order to do this, we need an almost 
sure bound, fj(e), for the proportion of households that are neighbours of the 
susceptibility set when the size (in terms of households) of the susceptibility 
set is at most em, which we now obtain. 

Suppose that D has infinite support. Recall the definitions of Ph(-) and 
Ph{-) from Section 6.2.3. Let k\ = min{A; : puik) > 0} and e = l - Puiki) — 



p H (h+l). Then, fore G (0,e ), let «(e) = max{/c : £j = kiM*) e (0, 1-e)}, 
K*(e) = max{k < «(e) : p H (k) > 0} and r]{e) = ££ K . (e) pjr(i). ( Th e def- 



inition of k*(s) requires n(e) > hi, which in turn requires e < Eo-) Note 
that r)(e) | as e | 0. Let fj(e) = 2nfj,DV( £ ) an d, for m = 1,2,..., 
let . . . , H(™! be the order statistics of the household degrees 



2, • • • , 




Lemma 8. For any uji G A\ and e G (0, e ), 




(6.12) 
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and 

1 m 

i~ E H$M<fj(e) (6.13) 



m 

k=m— [em]+l 



for all sufficiently large m. 

Proof. Fix u>i G Ai and note that, for k — 0, 1, . . ., 

j m j m 

lim -Vljf,,( UlH1 = lim — V V l {jD ^ l)=d} 

m->oo 777, ^ — ' m^oo 777 ^ — ' ^ — ' 1 v ' ' 

i=l i=l {d: |d|=fc} 

= lim V p^Vi) 

{d:|d|=ifc} 

= (using Lemma l(ii)) 

{d: |d|=fc} 

= PH(k), (6.14) 



whence 



lim — N l{ J ff i (a;i)>K(£)+i} = 1 - lim - N N l{^( W i)=j} 

m->-oo 777 z — ' L v /- w j m->oo 777 z — ' z — ' 

i=l i=l j=0 

00 

E 

j=«(e)+l 

Thus, since Y1^=k(s)+iP h U) > e (by ^ ne definition of we have, for all 

sufficiently large m, say ra > /V (e,wi), that mT l YaU l{i/ l (^i)> K ( £ )+i} > £, 
whence H^_ [£m]+1) (^i) > k(e). Hence, for m > N (e,uji), 



m 00 



(^7— E #(fc)Vi) < RT^E E kl {H^i)=k} 

™>Pk M k=m-[em]+l m ^H M i=l fc= K ( £ )+l 



m^Vi) - Eti Efcil ^{ffiW=t} 

1 K(e) fc m 
/% (wi) fc=0 m i=l 



k(s) 00 

1-5> H (*)= E Ph(*) 

fe=l fc=«(e)+l 
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as m — > oo, using (6.14) and Lemma l(i). Assertion (6.12) follows upon 
recalling the definition of K*(e). The second assertion (6.13) follows from 
the first assertion after applying Lemma l(i) and recalling the definition of 
77(e). □ 

Remarks. 

1. If (i m ax < oo (i.e. D has finite support) then it is readily seen that 
Lemma 8 holds with r]{e) = 2d max e/fiD and 77(e) = nd max e. 

2. For oui e Ai, Lemma 8 provides, for all sufficiently large m, a bound 
for the number of half-edges that emanate from households in a suscep- 
tibility set (and hence also for the number of households neighbouring 
a susceptibility set), if the susceptibility set contains no more than em 
households. The number of such half-edges, H^ m \e) say, is given by 
the sum of the degrees of the households in the susceptibility set, which 
is bounded by the sum of the degrees of the [em] households of highest 
degree. Thus, by (6.13), H^ m \e) < mfj(e) for all sufficiently large m. 

Recall from Section 6.2.2 that the coupling of the susceptibility set process 
and its approximating branching process X^™* breaks down when a half- 
edge is sampled that emanates from an appropriate 'bad' set of households. 
This can happen in two fundamentally different ways. First, a half-edge 
through which we try to extend the susceptibility set may be paired up with 
another half-edge through which we want to extend the susceptibility set in 
the same generation. Note that in this case, neither of the two half-edges con- 
cerned actually extends the susceptibility set. Second, the half-edge may be 
paired with a bad half-edge which is not one through which we wish to extend 
the susceptibility set in the current generation, in which case the suscepti- 
bility set may still be extended, though the offspring distribution is different 
to that in the branching process. We treat these two cases sequentially. 

Form = 1,2,... and k = 0,1,..., let X ( ™ ] = J2i=o X i be the to- 
tal number of individuals that have lived in the approximating branching 
process by time k and let = Yli=o ^i™^ De the total number of 

households in the susceptibility set process up to and including genera- 
tion k. Further, let X^ = YZo X i and = J2Zo Sf - Suppose that 
ui G Ai. Then, for all sufficiently large m, while < em, the probability 

that a half-edge is paired with another half-edge through which we want to 
extend the susceptibility set in the same generation is no more than 77(e). 
For such m, suppose that at some generation k there are = i 'live' half- 
edges through which we attempt to extend the susceptibility set. Denote by 
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Y L the number of these half-edges that do not pair up with another of these 

St „ 

i live half-edges and let Y L ~ Bin(i, 1 — \J 77(e)). We now show that Y L > Y L . 

First, define another random variable Y L as follows. Take a live half-edge, 
then with probability 77(e) pair it up with another live half-edge, otherwise 
it 'survives' to be connected with a non-live half-edge. Repeat this process 
until all live half-edges have been either paired up or designated to survive. 
Note that if there is a single live half-edge left at the end of this procedure, 
it must survive. Let Yl be the number of surviving half-edges under this 
regime. Since the proportion of half-edges that are actually live is less than 

St „ St „ 

77(e), Y L >Y L . We now show that Y L > Y L by describing these two random 
variables as the number of renewals of a discrete-time renewal process by 
time % and showing that the corresponding lifetime distributions, T and T 

* St „ 

say, satisfy T <T . This we achieve by taking a lifetime in the renewal process 
as being the number of half-edges examined to find a surviving half-edge. It 

is immediate that P(T = k) — (1 — rj(e)^)rj(e)~ , k = 1,2, Now, since 

pairing one live half-edge with another obviously uses up two half-edges, T 

cannot take even values and P(T = 2k + 1) = (1 — rj(e))r](e) k , k = 0, 1, 

Elementary calculation shows that P(T > k) < P(f > k), k = 1, 2, . . ., so 

f > f, whence Y L > Y L . 

The above argument shows that, in a given generation, the number of 
half-edges that survive to be paired with non-live half-edges is stochastically 
larger than if they survive independently with probability 1 — \Jr\{e). Now 
consider a live half-edge that survives this first stage and thus is paired with 
a half-edge chosen uniformly at random from all the non-live half-edges. The 
probability that it avoids being paired with a half-edge from the bad set is 
therefore larger than if it were paired with a half-edge chosen uniformly at 
random from all of the half-edges. Recall that is the probability that 
a half-edge chosen at random from all m/j,^ half-edges in the population 
emanates from a household of type d. Further, for u\ G A\ and m sufficiently 
large, conditional on choosing a household of type d, if < em and 

Y^ < (logm) 13 then the probability of choosing a bad household is bounded 
above by 

~(m)/ x (log m) 13 + em + rj(e)m 

7d (e) = Z^nj Al - 

mp y d 

This bound is obtained by noting that, under the stated conditions, there are 
fewer than (logm)' 3 bad households from the forward process, fewer than em 
households in the susceptibility set and fewer than fj(e)m households that 
are neighbours of the susceptibility set; and then assuming that all of these 
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bad households are of type d. 

It follows from this discussion that, for m sufficiently large and if < 

(logm)P, then while < em the susceptibility set process is stochas- 

tically larger than a branching process, 

X (m) 

say, in which each potential 
birth (live half-edge) is aborted independently with probability y/r)(e) and 
the potential offspring (live half-edges) of an unaborted birth are obtained 
by first sampling d according to then with probability ^^(e) this un- 
aborted birth is aborted at this stage and otherwise its potential offspring is 
distributed according to the random variable ^d defined at the end of the 
paragraph following (6.3). 

The number, x[ m ^ say, of potential births that emanate from the initial 
individual in the susceptibility set may be found as follows. First a household 
is chosen uniformly at random from the households not infected by time t m 
in the forward process. Suppose that this household is of type d. Then, if 
this household is not a neighbour of a household in the forward process, X^ 
is distributed according to the random variable also defined in the para- 
graph immediately following (6.3). If the sampled household is a neighbour 
of a household in the forward process then x[ m ^ has a different distribution. 
Suppose that T^ +1 < (login) 13 . Then the number of households that are 
neighbours of the forward process is less than 2(logm) /3 and it follows that 

x[ m ^ is stochastically larger than a random variable, x[ ^ say, obtained by 
first sampling d according to and then setting x[ ^ = with probability 
(m) _ 2(iogm)' 9 ^ ^ otherwise I, is distributed according to ^d- 

Assume that there is a single ancestor in the branching process 

X (m) 

which has a number of potential offspring distributed as x[ \ We now 
have a complete description of how £ X^ evolves. Let £ X^ and £ W^ be, 
respectively, the total number of potential and unaborted births in £ X^ m \ 
Recall the event defined at the end of Section 6.4.2, giving our working 
definition of a major outbreak. The above arguments show that 

¥ D(ull) (W^ >[em}\G^) > ¥ D{lJl) ( £ W {m) > [em]) 

= Pc (wi )( e * (m) =oo). (6.15) 

For the branching process £ X ( - m \ let £ & < - m * ) = { £ b^™\ £ & 1 m ' 1 , . . .) denote the 
distribution of the number of potential offspring of the initial individual and 

let Jb^ = ( e &o m \ J>i. m \ • • •) denote the distribution of the number of potential 
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offspring of a typical potential birth. Then 



s% 


= E 


(m) 
Pd 


u(m) 


= E 

dez™ 






= \fn{ 





and 

e~b { k m) = (1 - v 7 ^)) E P?^ 1 " ^ m) ( £ )) = k ) (* = 1. 2, . . .)■ 



Note that does not depend on e, however it is distinct from and we 
retain the notation e b^ to indicate that it is associated with the branching 
process £ X^ m \ 

The following lemma is useful for determining the limits of the distribu- 
tions £ b^ and Jb^ as m — > oo. Its proof is standard and is hence omitted. 

Lemma 9. Suppose that, for all d G Z™ and m = 1, 2, . . ., £/ie rea/ numbers 
(i) p^ and pa are non-negative and satisfy p^ — )■ as m — > oo and 

SdeZ™ Pd = SdeZ™ Pd = 1/ 

fnj and belong to [0, 1] and satisfy ad as m — >■ oo; 

fmj Cd belong to [0,1]. 
Tnen, as m — )■ oo, 

E(rrt) (m) v \ 

Pd "d C d ->• 2^ PdttdCd- 

dez™ dez™ 
For d G Z™ and £ G (0,e ), let 



ld{e) 



if Pd = 0. 



36 



Lemma 10. For all u\ G A±, limm,^^ £ b ( - m> = b and linim^oogb = £ b, 
where b = (b ,bi, . . .) is as in Section 6.2.3, and e b = ( £ b , e bi, . . .) is given 
by 

£ b = v^(i) + (1 - v^(i)) Vd (71(e) + (1 - 7 d (e)) P(^ d = 0)) 

and 

A = (1 - v 7 ^)) P^ 1 ~ = k ) (* = 1, 2, . . .)■ 



Proof. Note that, for wi G Ai, 7 d m ' ) (wi) — >■ and 7 d m ^(e,u;i) — >■ 7d(e) as 
m — )■ 00 (for all d with p d > 0). The required assertions then follow using 
Lemma 9. □ 

Remark. It is easily verified that YlkLo^k = 1, i-e. that e 6 is a proper 
probability distribution. 

Recall the definition of e in the paragraph preceding Lemma 8 and, 
for e G (0,e ), let £ X = ( £ X k , k = 0, 1, . . .) ~ BP(l,6, e 6). Let £ X de- 
note the total progeny of £ X, excluding the ancestor. Let (X,Xa) denote 
the total progeny of the branching process (X, Xa) (defined at the end 
of Section 6.2.3), including the ancestor. Also let X^ = X^o^aT'*' so 
(X( m ),X^ m) ) is the total progeny of (X^ m \ X^). 

Lemma 11. (i) For all u)\ G A\, 

(a) lim F D(ull) (X^ + Xf ] = k) = ¥(X + X A = k) (k — 
1,2,...)/ 

(b) lim P D(ui) (l (m) + X { f = 00) = P(X = 00). 

(ii) For all Ui G A 1 and e G (0, e ), 

(a) UmF D(ull) ( £ X^ = k)=n £ X = k) (k — 1,2,.. .); 

(b) lim W D{ui) ( e xW = 00) = ¥( £ X = 00). 

m— >oo 

Proof. For all uj\ G A\ and d G Z", p^^i) — >■ and p^^i) — > pd 

as m — > 00, so, using Scheffe's theorem, b m '{uji) — > b and b (u>i) — > b as 
m — >■ 00. Part (ii) (b) then follows using Lemma 2 (ii) and noting that, almost 
surely, X^ + X^ = 00 if and only if X^ — 00. A similar argument 
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shows that, for all u>i G Ai, the offspring laws of (X^ m \ X^) converge to 
those of (X,Xa) as m — > oo. Part (i)(a) then follows from the extension of 
Lemma 2(i) to two-type branching processes. Part (ii) of the lemma follows 
immediately from Lemmas 10 and 2. □ 

6.5.2 Relative final size of a major outbreak. 

For m — 1,2,..., let _E?( m ) be the event that an individual chosen uniformly 
at random from all individuals that are susceptible at time t m in the forward 
process is ultimately infected by the epidemic E^ m \ Thus, if A^ denotes 
the set of global neighbours of then occurs if and only if one of the 
'live' half-edges from the forward process is paired in the construction of 
£M u ^M. Recall the working definition of a major outbreak, viz. 

G (m) = 

{Z^ > logm, T 1 /™^ < (logm)' 3 }, where /3 is as in Lemma 5. 
Theorem 2. For all oo\ E A\, 

lim F D{ui) (B™ | G {m) ) = P(X = oo). 

Proof. For m = 1, 2, . . ., let be the number of half-edge pairings made 
in the construction of U A^ until one of the Z^ live half-edges from 
the forward process is chosen. In determining it is assumed that, if 
necessary, the pairings continue after U A^ goes extinct and that 
includes the pairing when the first live half-edge is chosen. 

Fix uji G A x . First we obtain an upper bound for Pr>( wi )(-B( m ) | G^). For 
all fixed k G N, 

l-Pz?K)( 5(m) I Q(m) ) > p DK)(^p m) >k,X i - m) +xf ) < k , f (m) > k | G (m) ), 

(6.16) 

where 

f (m) ig 

the number of households in the construction of U A^ 
when the first bad half-edge is chosen. Note that f*" 1 ) = 1 if the initial 
individual in (X^ m \ X^) belongs to the set of bad households at time t m in 
the forward process. Given G^ m \ the number of such bad households is less 
than (logm)' 3 , so F D ^ LUl )(f ( - rr ^ = 1 | G^) — > as m — > oo. Arguing as in the 
proof of Theorem 1 then shows that, for all k G N, 

lim F D(u>l) (f (m) > k | G (m) ) = 1. (6.17) 

Let denote the number of half-edges used up to time t rn in the 

forward process. Now, for all fceN, 

f DM (t?> > k i G <-> , q<-> , z<:») = n H'^-f'-f^-f' 

i=i\ rnfj> h 'M-QW- 2(^-1) 

(6.18) 
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and, since we have conditioned on G^ m \ < 2(logm) /3 and < 

2(logm) /3 . It then follows from (6.18) that 

lim F D{uil) (T^ > k | G™) = 1 (6.19) 

m— s-oo 

for all fceN. Letting m — > oo in (6.16), using (6.17) and (6.19), and noting 
thatX( m )+xi m) and are conditionally independent given D(u\), yields, 
for all k eN, 

lim sup P^) (5^ | G (m) ) < limsupP c(wi) (X (m) + X ( ™ ] > k) 

m—too m—toc 

= F(X + X A >k), 
using Lemma ll(i)(a). Letting k — )■ oo then yields 

limsupP jDK) (5 (m) | G {m) ) < F{X = oo). (6.20) 

m— s-oo 

Now we obtain a lower bound for P ' D^iB^ \ G lym) ). First note, us- 
ing (6.18), that for any e G (0, 1), we have 

P D( „,(rr> > W !(*->) < (i - -^-T < ex P te' 

Now, ^h\oji) — > n/iD as m — > oo (since ct?i G ii), so [em] log m / m /j,^ (ui) — >■ 
oo as m — >■ oo, whence 

lim P jD(wi) (Ti ) m) < [em] | G (m) ) = 1. (6.21) 

m— >oo 

Also note that, since is obviously contained in U A^ m \ 

Po( £ , 1 )( 5M I G (m) ) > P D K)( T p m) < [em] , > [em] | G (m) ), (6.22) 

for any e G (0, 1). Thus, using (6.22) and (6.21), then (6.15) and Lemma ll(ii)(b), 
for any e G (0, e ), 

liminf P jDK) (fi (m) | G {rn) ) > liminf F D{uJl) {W {m) > [em] | G (m) ) 

m— s-oo m— s-oo 

> liminfP^)^" 1 ) = oo) 

m— s-oo 

= F( £ X = oo). (6.23) 

It is easily verified, using the dominated convergence theorem, that ( £ b, e b) — > 
(b, b) as e I 0, so letting e l in (6.23) and using Lemma 2(ii) yields 

liminf F D(uJl) {B (m) \ G {rn) ) > F(X = oo), 

m— s-oo 

which together with (6.20) establishes the assertion of the theorem. □ 
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For m = 1,2,..., let be the total number of individuals infected 

by time k in the forward epidemic process {k = 0, 1, . . .) and let Z^ 
denote the total number of individuals who are ultimately infected in E ^ . 

Corollary 2. (i) For all u x G A x , lim —E D(ull) [Z (m) I G (m) ] = F(X = 

m->oo mn 

oo); 



(ii) lim — E[Z {m) | G im) ] = F(X = oo). 



m->oo mn 



Proof. Fix ui G A\. For m = 1, 2, . . ., let X tm denote the number of suscepti- 
ble individuals at time t m in the forward process, and label these individuals 
1,2,..., X t . Then 

7 7 7 



^ _ ^in, T 2-^ {' ultimately infected}- 



i=l 

Given the occurrence of G^ m \ Z^ < 2n(logm) /3 and X tm > nm— 2n(logm) /3 . 
Thus 

lim — E D(ull) [Z^ | G* m >] = lim F D(ull) (B^ | G (m) ) 

and assertion (i) follows using Theorem 2. Assertion (ii) then follows by the 
dominated convergence theorem. □ 

Finally, note from the discussion at the end of Section 6.4.2 that Corol- 
lary 2 holds with replaced by G^ m \ where G^ is the event that the 
epidemic E^ infects at least logm households. 



7 Concluding comments 

We have analysed the spread of an SIR epidemic within a population struc- 
ture that features some significant departures from traditional homogeneous 
mixing; specifying both a local household structure and using random net- 
works with an arbitrary degree distribution (with finite variance) to model 
potential 'global' contacts. Rigorous limit theorems were obtained, valid as 
the number of households m — > oo, from which one can determine the prob- 
ability of a major outbreak and the expected relative final size of such an 
outbreak. The potential usefulness of these results was verified by showing, 
numerically, that these asymptotic results provide good approximations for 
the behaviour of moderately sized finite populations. 
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As stated in Section 2, our results easily generalise to allow for unequal 
household sizes. For example, we can decompose R* in a variable household 
size framework as = Y^=\PnR^\ where p n is the size-biased proportion 
of households of size n and R^f* is the threshold parameter R* in the case of 
a fixed household size n. (The size-bias of p n arises because if a proportion 
p n of households are of size n then an individual chosen uniformly at ran- 
dom is in a household of size n with probability proportional to np n ; thus 
we require Y^=i n Pn < °°0 Full details of this generalisation will appear 
in a forthcoming paper, which will discuss our model from a more applied 
viewpoint. 

Another condition that we have required is that the variance, a 2 D) of the 
degree distribution is finite. Whilst this is necessary for all of our proofs, 
the PGFs of C, C, B and B are all well-defined so long as po < 00 and nu- 
merical studies (along the lines of those encompassed by Figure 3) indicate 
that our methods at least give good approximations when <t|, — 00. This is 
particularly relevant in light of several of the studies cited by Newman (2003, 
Section III.C), which suggest that degree distributions which asymptotically 
follow some power law are appropriate models in some real-world situations. 
We note, however, that when a 2 D = 00 it is not known (to our knowledge) 
whether self-loops and parallel edges remain sufficiently sparse in the net- 
work, so the argument that our results continue to hold if we condition on 
there being no such imperfections (second paragraph of Section 2) may not 
be valid. 

Of course there are other features of our model that in many circum- 
stances will be unrealistic. In particular, the method of construction of 
the random graph — pairing the half-edges uniformly at random — ensures 
not only that there are (asymptotically) very few 1-cycles (self-loops) and 
2-cycles (parallel edges) in the resulting multigraph, but also that there are 
very few 3-cycles (triangles). Thus, in the asymptotic model that we analyse, 
individuals have no mutual acquaintances outside their household, which is 
unrealistic. Similarly the random graph model has very few edges which 
join individuals in the same pair of households, i.e. the acquaintances of two 
individuals are, with probability close to 1, all in distinct households. That 
this is the case stems from the construction of the random graph: although 
there is heterogeneity amongst the individuals (through differing degrees), 
the uniformly at random pairing of half-edges means that the mixing is still 
homogeneous — this being critical for the branching process approximations. 
In this sense it seems fair to say that our model incorporates some hetero- 
geneity of both the individuals in the population (via the differing degrees of 
individuals and varying household sizes) and their mixing (having both local 
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and global infection). 

Nevertheless, our model does capture some important heterogeneities 
which are present in real populations and which doubtless have a significant 
effect on the spread of disease through these populations. Some additional 
features, such as having the degree distribution D or the infection rates 
and \g depend on household size or incorporating correlation between the 
degrees of individuals within the same household can in principle be included 
in our model relatively simply, though the calculations quickly become very 
cumbersome. 

The usual approach for obtaining fully rigorous results concerning the 
final size of a major epidemic on a random network is via the existence and 
uniqueness of a giant component in an associated bond percolation model 
(see e.g. Britton et al. (2007) and the discussion in Section 4 of Britton et 
al. (2008)). This requires that the infectious period is constant (though see 
Kenah and Robins (2007)) and fully rigorous results concerning the compo- 
nent structure of the percolation model, which may not be easy to prove. 
We have developed a different approach, which does not require a constant 
infectious period. Although not the focus of the paper, it seems plausible 
that our methods can be used to prove existence and uniqueness of a giant 
component for our random network (and indeed for other network models) 
and that they might also be applicable to epidemics on other random graph 
models, such as the random intersection graph considered by Britton et al. 
(2008). 

Further study of this model will include an analysis of the effect of vacci- 
nation on epidemic spread (work ongoing) and it seems likely that a central 
limit theorem for the final size of a major outbreak might be derived using 
methods similar to those of Ball and Neal (2008). 
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